Integrated Feature Normalization and Enhancement for robust Speaker Recognition using Acoustic Factor Analysis
نویسندگان
چکیده
State-of-the-art factor analysis based channel compensation methods for speaker recognition are based on the assumption that speaker/utterance dependent Gaussian Mixture Model (GMM) mean super-vectors can be constrained to lie in a lower dimensional subspace, which does not consider the fact that conventional acoustic features may also be constrained in a similar way in the feature space. In this study, motivated by the low-rank covariance structure of cepstral features, we propose a factor analysis model in the acoustic feature space instead of the super-vector domain and derive a mixture dependent feature transformation. We demonstrate that, the proposed Acoustic Factor Analysis (AFA) transformation performs feature dimensionality reduction, de-correlation, variance normalization and enhancement at the same time. The transform applies a square-root Wiener gain on the acoustic feature eigenvector directions, and is similar to the signal sub-space based speech enhancement schemes. We also propose several methods of adaptively selecting the AFA parameter for each mixture. The proposed feature transform is applied using a probabilistic mixture alignment, and is integrated with a conventional i-Vector system. Experimental results on the telephone trials of the NIST SRE 2010 demonstrate the effectiveness of the proposed scheme.
منابع مشابه
Integrated Feature Normalization and Enhancement for Robust Speaker Recognition Using Acoustic
State-of-the-art factor analysis based channel compensation methods for speaker recognition are based on the assumption that speaker/utterance dependent Gaussian Mixture Model (GMM) mean super-vectors can be constrained to lie in a lower dimensional subspace, which does not consider the fact that conventional acoustic features may also be constrained in a similar way in the feature space. In th...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملAdvanced Feature Normalization and Rapid Model Adaptation for Robust In- Vehicle Speech Recognition
In this study, we present advanced feature normalization and rapid model adaptation for robust in-vehicle speech recognition. For feature normalization, we use a combination of recently established quantile-based cepstral dynamics normalization (QCN) and low pass temporal filtering (RASTALP). Similar to cepstral mean normalization (CMN), QCN aims at alleviating the mismatch between ASR acoustic...
متن کاملAn investigation of likelihood normalization for robust ASR
Noise-robust automatic speech recognition (ASR) systems rely on feature and/or model compensation. Existing compensation techniques typically operate on the features or on the parameters of the acoustic models themselves. By contrast, a number of normalization techniques have been defined in the field of speaker verification that operate on the resulting log-likelihood scores. In this paper, we...
متن کاملSpeaker Identification Using Ensembles of Feature Enhancement Methods
In this paper, we propose a classifier ensemble of various channel compensation and feature enhancement methods for robust speaker identification on various environments. The proposed ensemble system is constructed with 15 classifiers including three channel compensation methods (including CMS and variance normalization, and without compensation) and five feature enhancement methods (including ...
متن کامل